NCTU and NTUT's Entry to CLP-2014 Chinese Spelling Check Evaluation
نویسندگان
چکیده
This paper describes our Chinese spelling check system submitted to SIGHAN Bake-off 2014 evaluation. The system’s main components are still the conditional random field (CRF)-based word segmentation/part-ofspeech (POS) tagger and tri-gram language model (LM) used last year. But we tried to refine the misspelling rules, decision-making threshold and improve LM rescoring speed to reduce false alarm rate and improve rescoring speed. Bake-off 2014 evaluation results show that one of our system (Run2) did achieve reasonable performance with about 0.485/0.468 accuracies and 0.226/0.180 F1 scores in the detection/correction metrics.
منابع مشابه
Chinese Spelling Check System Based on Tri-gram Model
This paper describes our system in the Chinese spelling check (CSC) task of CLP-SIGHAN Bake-Off 2014. CSC is still an open problem today. To the best of our knowledge, n-gram language modeling (LM) is widely used in CSC because of its simplicity and fair predictive power. Our work in this paper continues this general line of research by using a tri-gram LM to detect and correct possible spellin...
متن کاملIntroduction to NJUPT Chinese Spelling Check Systems in CLP-2014 Bakeoff
Chinese spelling check (CSC) is an essential issue in the research field of Chinese language processing (CLP). This paper describes the details of two CSC systems we developed to solve this problem. The first system was built based on CRF model, and the modules of such system include word segmentation, error detection and error correction. Another system was based on 2Chars&&3-Chars model, and ...
متن کاملNTOU Chinese Spelling Check System in CLP Bake-off 2014
This paper describes details of NTOU Chinese spelling check system participating in CLP2014 Bakeoff. Confusion sets were expanded by using two language resources, Shuowen and Four-Corner codes. A new method to find spelling errors in legal multi-character words was proposed. Comparison of sentence generation probabilities is the main information for error detection and correction. A rulebased c...
متن کاملOverview of SIGHAN 2014 Bake-off for Chinese Spelling Check
This paper introduces a Chinese Spelling Check campaign organized for the SIGHAN 2014 bake-off, including task description, data preparation, performance metrics, and evaluation results based on essays written by Chinese as a foreign language learners. The hope is that such evaluations can produce more advanced Chinese spelling check techniques.
متن کاملChinese Spelling Check Evaluation at SIGHAN Bake-off 2013
This paper introduces an overview of Chinese Spelling Check task at SIGHAN Bake-off 2013. We describe all aspects of the task for Chinese spelling check, consisting of task description, data preparation, performance metrics, and evaluation results. This bake-off contains two subtasks, i.e., error detection and error correction. We evaluate the systems that can automatically point out the spelli...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014